GoSafeOpt: Scalable safe exploration for global optimization of dynamical systems
نویسندگان
چکیده
Learning optimal control policies directly on physical systems is challenging. Even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited local optima. This work proposes GoSafeOpt as the first provably safe and algorithm safely discover globally for with high-dimensional state space. We demonstrate superiority of over competing in simulation experiments robot arm.
منابع مشابه
Safe Exploration for Identifying Linear Systems via Robust Optimization
Safely exploring an unknown dynamical system is critical to the deployment of reinforcement learning (RL) in physical systems where failures may have catastrophic consequences. In scenarios where one knows little about the dynamics, diverse transition data covering relevant regions of state-action space is needed to apply either model-based or model-free RL. Motivated by the cooling of Google’s...
متن کاملPROJECTED DYNAMICAL SYSTEMS AND OPTIMIZATION PROBLEMS
We establish a relationship between general constrained pseudoconvex optimization problems and globally projected dynamical systems. A corresponding novel neural network model, which is globally convergent and stable in the sense of Lyapunov, is proposed. Both theoretical and numerical approaches are considered. Numerical simulations for three constrained nonlinear optimization problems a...
متن کاملGlobal Optimization using a Dynamical Systems Approach
We develop new algorithms for global optimization by combining well known branch and bound methods with multilevel subdivision techniques for the computation of invariant sets of dynamical systems. The basic idea is to view iteration schemes for local optimization problems — e.g. Newton’s method or conjugate gradient methods — as dynamical systems and to compute set coverings of their fixed poi...
متن کاملSafe Exploration for Optimization with Gaussian Processes
We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multiarmed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified “...
متن کاملobservational dynamical systems
چکیده در این پایاننامه ابتدا فضاهای متریک فازی را به صورت مشاهدهگرایانه بررسی میکنیم. فضاهای متریک فازی و توپولوژی تولید شده توسط این متریک معرفی شدهاند. سپس بر اساس فضاهایی که در فصل اول معرفی شدهاند آشوب توپولوژیکی، مینیمالیتی و مجموعههای متقاطع در شیوههای مختلف بررسی شده- اند. در فصل سوم مفهوم مجموعههای جاذب فازی به عنوان یک مفهوم پایهای در سیستمهای نیم-دینامیکی نسبی، تعریف شده است. ...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Artificial Intelligence
سال: 2023
ISSN: ['2633-1403']
DOI: https://doi.org/10.1016/j.artint.2023.103922